A Theory of Higher Order Probabilities
نویسنده
چکیده
We set up a general framework for higher order probabilities. A simple HOP (Higher Order Probability space) consists of a probability space and an operation PR, such that, for every event A and every real closed interval zL PR(A ,A) is the event that A's "true" probability lies in A. (The ' t rue" probability can be construed here either as the objective probability, or the probability assigned by an expert, or the one assigned eventually in a fuller state of knowledge.) In a general HOP the operation PR has also an additional argument ranging over an ordered set of time-points, or, more generally, over a partially ordered set of stages; PR(A,t,z~) is the event that A's probability at stage t lies in A. First we investigate simple HOPs and then the general ones. Assuming some intuitively justified axioms, we derive the most general structure of such a space. We also indicate various connections with modal logic. IA part of this paper has been included in a talk given in a NSF symposium on foundations of probability and causality, organized by W. Harper and B. Skyrms at UC irvine, July 1985. ! wish to thank the organizers for the opportunity to discuss and clarify some of these ideas. 275 276 SESSION 6 Introduction The assignment of probabilties is the most established way of measuring uncertainties on a quantitative scale, in the framework of subjective probability, the probabilities are interpreted as someone's (the agent,'s) degrees of belief. Since justified belief amounts to knowledge, the assignment of probabilities, in as much as it can be justified, expresses knowledge. Indeed, knowledge of probabilities, appears to be the basic kind of knowledge that is provided by the experimental sciences today. This is knowledge of a partial, or incomplete, nature, but not in the usual sense of ~partial ' . Usually we mean by 'partial knowledge ~ knowledge of some, but not all, of the facts in a certain domain. [~ut knowing that a given coin is unbiased does not enable one to deduce any non-tautological Boolean combination of propositions which describe outcomes in the next, say fifty tosses. And yet it constitutes very valuable knowledge about these very same outcomes. What is the objective content of this knowledge ? What kind of fact is the fact that the true probability of mheads" is 0.5, i.e., that the coin is / unbiased ? I shall not enter here into these classical problems 2. | take it for granted that, among probability assignments, some are more successful, or better tuned to the actual world, than others. Consequently probability assignments are themselves subject to judgement and evaluation. Having,for example, to estimate the possibility of rain I might give it, going by the sky's appearance, 700~. But I shall be highly uncertain about my estimate and will adopt the different value given, five minutes later, in the weather forecast. Thus we have two levels of uncertainty: 1. Uncertainty concerning the occurence of a certain event expressed through the assignment of probabilities. 2. Uncertainty concerning the probability values assigned in 1. 2My Salzburg paper {If~83] has been devoted to these questions. The upshot of the analysls there has been that even a "purdy subjective n probability implies a kind of factual claim, for one can ~sses it8 success in the actual world, Rather than t w o different kinds, subjective and objective probabflties are better to be regarded as two extremes of a spectrum. A THEORY OF HIGHER ORDER PROBABILITIES 277 When this second level is itself expressed by assigning probabilities we get second order probabilities. An example of a second order probability is furnished by a cartoon in aThe New Yorker" showing a forecaster making the following announcement: "There is nvw 60~ chance of rain tomorrow, but, there is 70% chance that later this evening the chance of rain tomorrow will be 80~. m Just as we can iterate modal or epistemic operators, so in the system to be presented here we can iterate the probability-assignment operator to any depth. The goal of this paper is to present a general and adequate semantics for higher order probabilities and to obtain, via representaton theorems, nice easily understood structures which give us a handle on the situation. The basic structure to be defined here is a H O P (Higher Order Probability space). A simple HOP is based on a field of events, F , and on a binary operator PR( , ) which associates with every event A and every real closed interval A an event PR(A,,~) in F. The intended meaning is that PR(A,A) is the event that A's true probability lies in the interval A. "True probability" can be understood here as the probability assigned by an ideal expert or by someone fully informed. It is however up to us (or to t h e agent} to decide what in the given context constitutes an "ideal expert" or "someone fully informed:.: If 'full information" means knowing all the facts then, of course, the true (unknown to us) probability has only two values 0 and 1; this will make the HOP trivial in a certain sense. In the other extreme, the agent may regard himself as being already fully informed and this leads to the "opposite" trivialization of the HOP. Generally, the agent will regard the expert as being more knowledgeable than himself, but not omniscient; e.g., the expert might know the true bias of a coin but not the outcomes of future tossings, or he might have statistical information for estimating the bias, which the agent lacks. The agent himself at some future time can be cast in the role of being 'fully informed'. Thus, if P is the forecaster's present probability function and if PR represents his state of knowledge later in the evening, then his announcement in "The New Yorker' cartoon can be summed up as follows, where A = 'tomorrow it will rain': P(A)=.6 P(PR(A , [.8,81))----.7 In order to represent knowledge at different stages, we make PR into a 3-place operator: PR(At t t Zi) is the event that the probability of A at stage t lies in A~ The stages can be time-points, in which case t ranges over sortie ordered set. More generally, the set of stages is only partially ordered, whe re , < t if the knowledge at stage t includes the knowledge at stage s. (Different agents may thus be represented in the structure.) This is how a HOP is defined in general. We shall first establish the 278 SESSION 6 properties of simple HOPs, then use them to derive those of the more general spaces. We shall also define, in a seperate section, a formal logical calculus, to be called probability logi¢, which is naturally associated with simple HOPs. Various modalities can be reconstructed within this calculus. The general HOPs give rise to stage-dependent modalities whose calculus will be outlined at the end of the paper. The import of the subject for various branches of philosophy and for the foundations of probability is obvious. Also obvious should be its bearing upon applied probabilistic reasoning in distributed networks, or upon efbrts to incorporate such reasoning in AI systems. Mathematically, most of this paper is rather easy. Our goal has not been to prove difficult theorems, but to clarify some basic concepts and to outline a general, conceptually "clean', framework within which one can use freely and to good effect statements such as: 'With probability 0.7 Adam will know at stage 3 Bob's probability for the event A, with error 0.01' (where Adam and Bob are either people or processors). Statements of this form express intuitive thinking which may underly involved technical proofs; to use them openly and precisely can help us as a guide for finding and organizing our arguments. A theoretic framework for higher order probabilities may also yield insights into systems of reasoning which employ non-probabilistic certainty measures. For when probability is itself treated like a random variable, we can use various methods of %ale" estimation which do not necessarily yield a probability measure. For example, define the certainty measure of an event A to be the largest a such that, with probability 1, the probability of A is > or. This is only one, apparently the most conservative, measure among various measures that can be used. Higher order probabilities have been considered by De-Finetti, but rejected by him owing to his extreme subjectivist views. Savage considered the possibility but did not take it up, fearing that the higher order probabilities will reflect back on the ground level, leading to inconsistencies. Instances of higher order probabilities figure in works of Good [1965] and Jaynes [1958]. More recent philosophical works are by Demeter [1985], Gardenfors [1975] (for qualitative probabilities), Miller [1960], Skyrms [1980 A], [1980 B] who did much to clarify matters, van-Frassen [1984], and others. Due to limitations of space and deadline I have not entered into details of various proofs. Some of the material has been abridged; I have included some illustrative examples of simple HOPs, but not the more interesting ones of general HOPs (which arise naturally in distributed systems). Also the bibliography is far from complete. A THEORY OF HIGHER ORDER PROBABILITIES 279 Simple HOPs D e f i n i t i o n a n d B a s i c P r o p e r t i e s As in Kolmogoroff's framework [19331 we interpret propositions as subset__.~ of some universal set, say IV, and we refer to them as events. We can regard W as the set of all possible worlds. Thus we have X = set of all worlds in which X is true and we get the following standard correspondence: V (disjunction) ~ U (union) A (conjunction) ~ Cl (intersection) ( nega t i on )~ (complementa t ion) T e r m i n o l o g y : A Boolean Algebra (of sets) is a class of sets closed under finite unions (and intersections) and under complementation (with respect to some presupposed universal set, in our case 1,tl). A fiel___d is a Boolean algebra closed under countable unions (what is known also as a a-algebra). The field (Boolean algebra) generate d by a class S of sets is the smallest field (Boolean algebra) which contains S as a subclass. Note that in generating a Boolean algebra we apply finitary operations only, whereas in generating a field infinitary countable operations are used. A field is countably generated if it has a countable set of generators. All probabilities are assumed here to be countably additive. A H O P is a 4-tuple (W, F, P, PR), where F is a field of subsets of W, to be called events, P is a probability over F and PR is a mapping associating with every A E F and every real closed interval ,5 an event PR(A,A), PR : F X set of closed intervals .o, F As explained in the introduction PR(A,,5) is the event that the true (or the eventual, or the expertassigned) probability of A lies in A. P is the agent's current subjective probability. Among the closed intervals, we include also the empty interval, 0. The minimal and maximal elements of F are, respectively, 0 and 1; that is: 0 ~. empty subset of W = False, 1 ~. W.~ True. In the explanations I shall use 'probability" both for the agent's current subjective probability as well as for the true, or eventual one; the contexts indicate the intended reading. The following axioms are postulated for a HOP: (I) PR(A, [0,1]) ~ I (For every A, the event that A's probability lies in [Off/is W, i.e., true.) (II) PR[A,O] = 0 (That A's probability lies in the empty interval is the empty event, i.e., false.) 280 SESSION 6 (Ill) I f ~ iUAe is an interval then eR(A, a lOae) ---PR(A,a I) U PR(A,a e) (A's probability lies in the interval ~ l U ~ 2 iff it lies either in A 1 or in A2) In the follwing two axioms WnW is a running index ranging over {1,£,.. }. (IV) NnPR(A, zAn) = PR(A, NnAn) (A's probability lies in every `sn iff it lies in their intersecton). (V) If, for all n#rn, AnNA m = q}, then NnPR(A , [an, fin]) G PR(U n A n , [~notn, Cnfln]) (For pairwise disjoint Ans, if An's probability lies in [an, fi n ], n=l,2, . . . , then the probability of On(An) lies in Note that axioms (I)-(V) involve only W, F and PR. The crucial axiom which connects PR with P will be stated later. T H E O R E M 1 For every HOP, H = ~ F, P, PR) there is a mapping p which associates with every x in W a probability, Pz ,over F such that (1) PR(A,zA) = {x : pz(A)e ~} The mapping p is uniquely determined by (1) and can be defined by: (2) pJm) = inf{o, : • C r e ( A , [0,o4)} a8 well as by: (2') px(A) = ~up{a : • E PR(A, [aft])}. Vice verea, if, for every x E W, Pz i8 a probability over F such that {z : pJA) E ,5} is in F for all a E F and all real closed ,5, and i f we use (1) as a definition of PR then Azioms (I)-~V) are satisfied. We call p the kernel of the HOP. The proof of Theorem 1 is nothing more than a straight.forward derivation of all the required details from the axioms, using (2) as the definition of p . (The =vice versa" part is even more immediate than the first part.) We can now extend PR and define PR(A,_=), for arbitrary subsets = of reals, as {~ : pJA) E .U}. If w is a Borel set then PB(A,.~) is in F. The meaning of p~ is obvious: it is the probability which corresponds to the m_ax!n~M state of A THEORY OF HIGHER ORDER PROBABiLiTIES 281 k n o w l e d ~ n world x the distribution chosen . .~ theexper t of_that_ worl__ ~. NotaUo.: For. e PR(A, ,)=dr Pn(A, The picture is considerably simpler in the discrete case, where W is countable. Assuming with no loss of generality that {x} E F for every x E W, the probability of some A C W is simply the sum of the probabilities of the worlds in A. In that case, we can eliminate the closed intervals and consider only the special cases PR(A,a) where ce ranges over [0,1]; also our 5 axioms can be replaced by 3 simpler ones. Discrete cases arise in many situations and are very useful as illustrative examples. But to consider only the discrete case is highly restrictive. Notation: For x,y E W, A EF, put: p(z,A) •dr pJA) and (assuming {y}EF) p(x,y) •df p(z,{y}) and = dr e({u} ). In the discrete case P is obviously determined by the values P(x), zEW. Thus, ordering IV, we can represent P as a probability vector (a countable vector of non-negative entries which sum up to 1). Similarly the kernel p becomes a probab!!t~'.matri x (a countable square matrix in which every row is a probability vector). Examples (i) and (ii) in the E x a m p l e l subsection can serve to illustrate the situation (the discussion there presupposes however the next subsection). Mathematically, what we have got is a Markov process (with initial probability P and transition probabilities p(x, ), zEW). But the interpretation is altogether different from the usual interpretation of such a structure. The connection between P and the kernel p is established in the sixth axiom. A x i o m (VI) A n d I t s C o n s e q u e n c e s Let P(AIB ) be the conditional probability of A, given B. It is defined in the case that P(B) ~ 0 as P(ANB)/P(B). It is what the agent's probability for A should be had he known B. A x i o m (VIw) If P(PR(A , h , fi])) ~ 0 then a < P(A I PR(A , h,f~]}) < 1~ . Axiom (VIw) (the weak form of the forthcoming Axiom ( V I ) ) i s a generalization of Miller's Principle to the case of intervM-based events. Rewritten in our notation, Miller's Principle is: P(A [ PR(A,a)) = a. Axiom (VIw) appears to be the following rule: My probability for A should be no less than a and no more than fi, were I to know that in a more informed state my probability for A will be within these bounds. Plausible as it sounds, the use of the hypothetical "were I to know that.., j needs in this context some clarification. Now a well-known way of explicating conditional probabilties is through conditional bets. Using such bets van-Frassen [1984] gives a Dutch-book argument for the Principle: Its 282 SESSION 6 violation makes possible a system of bets (with odds in accordance with the agent's probabilities) in which the agent will incur a net loss in all circumstances. In this argument PH(A,~) is interpreted as the event that the agent's probability for A at a certain future time will be c~, in which case he should accept at that time bets with odds c~. The same kind of Dutch-book can be constructed if Axiom (VIw) is violated. (Here it is crucial that we use an interval, the argument fails if we replace [o~,~] by a non-convex Borel set.) Axiom (VI) is the interval-based form of the stronger version of Miller's Principle which was suggested by Skyrms [1980 A]. A x i o m (VI) I f C is a finite intersection of events of the form PR(B,z~}, and i f P(G N PR[A,[o~,/3])} ~ O, then . _< P(A ] C n _< The same intuition which prescribes (VIw) prescribes (VI); also here the violation of the axiom makes possible a Dutch-book against the agent. What is essntial is that events of the form PR(B, zS) be, in principle, knowable to the agent, i.e., be known (if true) in the maximal states of knowledge as defined by our structure. 3 In what follows integrating a function f(t) with respect to a probability m is written as f f ( t) , m(dt). L e m m a 1 Axiom (VIw) implies that the following holds for all A C F: (3) P(A) = fp(x,A). P(dx) The proof consists in applying the formula P(A)=,UcP(A]Bi).P(Bi) , where the Bi's form a partition, passing to the limit and using the definition of an integral. The implication (3) ~ (VIw) is not true in general. Note that in the discrete case (3) becomes: (3 d) PCx) = PCy) which means that the probability vector is an eigen-vector of the kernel. Def in i t ion Call two worlds x,y E W epistimieally equivalent, (or, for short, ~ ) and denote it by x ~_ y , if Pz = Py • For S a class of events, define K[S] to be the field generated by all events of the form PR(A,z~), AES, ~ a real closed interval. 3It is important to restrict C in Axiom (VI) to an intersection of such events. The removal of this restriction will cause the pz's to be two-valued functions, meaning that all f a c t s are known in the maximal knowledge states. A THEORY OF HIGHER ORDER PROBABILITIES 283 Epistemic equivalence means having the same maximal knowledge. Evidently z _~ y iff, for all A and all A, xCPR(A,A) ¢~ yEPR(A,Zi); this is equivalent to: for all CEK[F], zEC ¢~ yet?. If K/F] is generated by the countably many generators Xn, n=O,1,.., then the equivalence classes are exactly all non-empty intersections NnX n' where each Xn' is either X n or Ks complement. Hence the equivalence classes are themselves in K/if], they are exactly the atom._s of this field. The next lemma shows that the condition that K[F] be countably generated is rather mild, for it holds whenever F itself is countably generated (which is the common state of affairs}: L e m m a 2 I f S i8 either countable or a countably generated field, then K[S] i8 countably generated. (As generators for K[S] one can take all PR(A,,A), AES, A a rational closed interval; the second claim is proved by showing that if S' is a Boolean algebra that generates the field S then K[S]fK[S].) Terminology: A 0-se___A is a set of probability 0. Something is said to hold for almost all z if it holds for all x except for a 0-set. The probability in question is P, unless specified otherwise. Theo rem 2 I f F is countably generated then aziora (VI) is equivalent to each of the following conditions: (A) (3) hold8 (for all A) and the following is true: Let C to which x belongs, then pz(Cz) ---1 for almost all z. be the episternic equivalence class (B) (3) holds and, for almost all x, for all A: (4) p=(A) ---]py(A)" pz(dy) The proof that axiom (VI) is equivalent to (A) and implies (B) uses only basic measure theory. The present proof of (B) =~ (A) relies on advanced ergodie theory 4 and I do not know if this can be avoided. Fortunately the rest of this paper does not rely on this implication (except the corresponding implication in Theorem 3). Note that in the discrete case (4) is equivalent to: (4d) p(x,Z)-~~yp(x,y)'p(y~Z) (4d) means that the kernel, as a matrix, is equal to its square. Let {E u : u E U} be the family of epistemic equivalence classes, with different indices attached to 4I am thankful to my colleagues at the Hebrew University H. Furstenberg, I. Katzenelson and B. Weiss for their help in this item. Needless to say that errors, if any, are my sole responsibility. 284 SESSION 6 different classes. Let Pu be the common Pz for z E Eu; let m be the probability, defined for all V C U such that UueuEu E F, by: m/V) = P(U.e V E ) Then (A) is equivalent to the following condition:
منابع مشابه
Explaining Heterogeneity in Risk Preferences Using a Finite Mixture Model
This paper studies the effect of the space (distance) between lotteries' outcomes on risk-taking behavior and the shape of estimated utility and probability weighting functions. Previously investigated experimental data shows a significant space effect in the gain domain. As compared to low spaced lotteries, high spaced lotteries are associated with higher risk aversion for high probabilities o...
متن کاملThermal Vibration of Composites and Sandwich Laminates Using Refined Higher Order Zigzag Theory
Vibration of laminated composite and sandwich plate under thermal loading is studied in this paper. A refined higher order theory has been used for the purpose. In order to avoid stress oscillations observed in the implementation of a displacement based finite element, the stress field derived from temperature (initial strains) have been made consistent with total strain field. So far no study ...
متن کاملFree Vibration of a Thick Sandwich Plate Using Higher Order Shear Deformation Theory and DQM for Different Boundary Conditions
In this paper, the effect of different boundary conditions on the free vibration analysis response of a sandwich plate is presented using the higher order shear deformation theory. The face sheets are orthotropic laminated composites that follow the first order shear deformation theory (FSDT) based on the Rissners-Mindlin (RM) kinematics field. The motion equations are derived considering the c...
متن کاملVibration Analysis of Functionally Graded Spinning Cylindrical Shells Using Higher Order Shear Deformation Theory
In this paper the vibration of a spinning cylindrical shell made of functional graded material is investigated. After a brief introduction of FG materials, by employing higher order theory for shell deformation, constitutive relationships are derived. Next, governing differential equation of spinning cylindrical shell is obtained through utilizing energy method and Hamilton’s principle. Making ...
متن کاملFree Vibration and Buckling Analysis of Sandwich Panels with Flexible Cores Using an Improved Higher Order Theory
In this paper, the behavior of free vibrations and buckling of the sandwich panel with a flexible core was investigated using a new improved high-order sandwich panel theory. In this theory, equations of motion were formulated based on shear stresses in the core. First-order shear deformation theory was applied for the procedures. In this theory, for the first time, incompatibility problem of...
متن کاملA New Three-Dimensional Refined Higher-Order Theory for Free Vibration Analysis of Composite Circular Cylindrical Shells
A new closed form formulation of three-dimensional (3-D) refined higher-order shell theory (RHOST) to analyze the free vibration of composite circular cylindrical shells has been presented in this article. The shell is considered to be laminated with orthotropic layers and simply supported boundary conditions. The proposed theory is used to investigate the effects of the in-plane and rotary ine...
متن کامل